Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus.
نویسندگان
چکیده
Availability of computational methods that predict disorder from protein sequences fuels rapid advancements in the protein disorder field. The most accurate predictions are usually obtained with consensus-based approaches. However, their design is performed in an ad hoc manner. We perform first-of-its-kind rational design where we empirically search for an optimal mixture of base methods, selected out of a comprehensive set of 20 modern predictors, and we explore several novel ways to build the consensus. Our method for the prediction of disorder based on Consensus of Predictors (disCoP) combines seven base methods, utilizes custom-designed set of selected 11 features that aggregate base predictions over a sequence window and uses binomial deviance loss-based regression to implement the consensus. Empirical tests performed on an independent benchmark set (with low-sequence similarity compared with proteins used to design disCoP), shows that disCoP provides statistically significant improvements with at least moderate magnitude of differences. disCoP outperforms 28 predictors, including other state-of-the-art consensuses, and achieves Area Under the ROC Curve of .85 and Matthews Correlation Coefficient of .5 compared with .83 and .48 of the best considered approach, respectively. Our consensus provides high rate of correct disorder predictions, especially when low rate of incorrect disorder predictions is desired. We are first to comprehensively assess predictions in the context of several functional types of disorder and we demonstrate that disCoP generates accurate predictions of disorder located at the post-translational modification sites (in particular phosphorylation sites) and in autoregulatory and flexible linker regions. disCoP is available at http://biomine.ece.ualberta.ca/disCoP/.
منابع مشابه
In Silico Perspectives on the Prediction of the PLP’s Epitopes involved in Multiple Sclerosis
Background: Multiple sclerosis (MS) is the most common autoimmune disease of the central nervous system (CNS). The main cause of the MS is yet to be revealed, but the most probable theory is based on the molecular mimicry that concludes some infections in the activation of T cells against brain auto-antigens that initiate the disease cascade.Objectives: The Purpose of this research is the...
متن کاملبررسی همبودی اختلال نارسایی توجه/ فزون کنشی با مشکلات برون نمودی و درون نمودی کودکان
Background and Aims: Based on case studies obtained from Achenauch System of Empirically Based Assessment (ASEBA), the present study examined comorbidity of attention deficit hyperactivity disorder (ADHD) with externalizing and internalizing difficulties in children.Materials and Methods: After psychiatric diagnosis of the disorder, comprehensive data were gathered by using DSM-IV Questionnaire...
متن کاملMethodological Principles and Applications of the Delphi Method: A Narrative Review
Background and Objectives: The appearance of complex issues with insufficient information has resulted in the consensus or unanimity spread. The consensus methods include the nominal group and the Delphi method.The Delphi method is a systematic process to predict and help making decision through survey rounds and information gathering and finally the group consensus. This method has been design...
متن کاملRAPID: fast and accurate sequence-based prediction of intrinsic disorder content on proteomic scale.
Recent research in the protein intrinsic disorder was stimulated by the availability of accurate computational predictors. However, most of these methods are relatively slow, especially considering proteome-scale applications, and were shown to produce relatively large errors when estimating disorder at the protein- (in contrast to residue-) level, which is defined by the fraction/content of di...
متن کاملComprehensive comparative assessment of in-silico predictors of disordered regions.
Intrinsic disorder is relatively common in proteins, plays important roles in numerous cellular activities, and its prevalence was implicated in various human diseases. However, annotations of the disorder lag behind the rapidly increasing number of known protein chains. The last decade observed development of a relatively large number of in-silico methods that predict the disorder using the pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of biomolecular structure & dynamics
دوره 32 3 شماره
صفحات -
تاریخ انتشار 2014